Circadian rhythm is an approximately 24 hour oscillation of biological processes that exists in organisms, including humans. Over the last two decades, various algorithms have been developed to detect these circadian rhythm. Here, we present the usage of seven algorthims, including Lomb Scargle, ARSER, JTK_CYCLE, RAIN, eJTK_CYCLE, MetaCycle, and BIO_CYCLE.
LS is an algorithm adopted from the field of astrophysics that detects oscillations using sinusoidal fits method.It is one of the earlier developed algorithms that can effectively deal with missing values by treating them as uneven sampling points.
ARS is an algorithm that applies autoregressive spectral estimation to predict the periodic length of the time-series, and uses harmonic regression model (sinusoidal fits) to fit circadian rhythms and predict its parameters such as period, phase, amplitude, etc.
JTK is a non-parametric, sinusoidal fits model that identifies oscillations by computing Kendall’s tau correlation between the time series and the specified reference sinusoidal curves.
MetaCycle is an integrated R-package of LS, ARS, and JTK. It enables users to select one or multiple methods to evaluate periodicity.
Below is an example of R code for individual analysis with LS and its result.
library(MetaCycle)
meta2d(infile = '../data/ds_hughes2009_meta.csv', outputFile = T, filestyle = 'csv',
timepoints = seq(18,64,2), cycMethod="LS", outIntegration = 'noIntegration', maxper = 28, minper = 20, outdir = '../data')
Change cycMethod= LS to cycMethod= ARS or cycMethod= JTK for individual anallysis with ARS or JTK.
Below is an example of integrated MetaCycle analysis with all three algorithms.
library(MetaCycle)
meta2d(infile = '../data/ds_hughes2009_meta.csv', outputFile = T, filestyle = 'csv',
timepoints = seq(18,64,2), outIntegration = 'onlyIntegration', maxper = 28, minper = 20, outdir = '../data')
RAIN builds on the strengths of JTK and improves its detectability with an additional set of asymmetric waveforms.
peak.border can specify the different form of the peak.
nr.series can specify the number of replicates.
measure.sequence (not shown) can be used to specify the uneven sampling.
\(q\)-values are calculated with qvalue() in R package qvalue.
Below shows an example of RAIN execution code and its output.
library(rain)
library(qvalue)
rain_ds_hughes2009 <- rain(t(ds_hughes2009[,-1]), deltat = 2, period = 24, period.delta = 4,
peak.border = c(0.3, 0.7), nr.series = 1, adjp.method = "ABH")
rain_ds_hughes2009$qvalue <- qvalue(rain_ds_hughes2009$pVal)$qvalue
eJTK can not only effectively dectect asymmetric waveforms, but also takes a step further to correct the initial p-values with empirical p-values calculated from permutation.
-w specifies the waveform to be compared, it is cosine as default;
-p specifies the period, 24, because eJTK can only detect one single period per time.
-s and -a specify phases and different form of the peak, respectively;
-x specifies the suffix of output file;
-f speficies the directory of input file.
Note: GammaP will vary when running the same code on the same dataset repeatedly.
Below shows an example of eJTK execution code and its output.
./eJTK/bin/eJTK-CalcP.py -w ./eJTK/ref_files/waveform_cosine.txt -p ./eJTK/ref_files/period24_1.txt -s ./eJTK/ref_files/phases_00-22_by2.txt -a ./eJTK/ref_files/asymmetries_02-22_by2.txt -x cos24_ph0-22_by2_a02-22_by2_OTHERTEXT -f ../data/ds_hughes2009.txt
BIO_CYCLE is a deep neural network (DNN) method trained from both simulated and empirical circadian and noncircadian time-series.
-i specifies the input file (.tsv);
-s specifies the starting (minimal) period;
-e specifies ending (maximum) period;
-o specifies the output directory in which the result file (.tsv) will be saved.
Below shows an example of BC execution code and its output.
Rscript ./BioCycle/BioCycle.R -i ../data/ds_hughes2009.tsv -s 20 -e 28 -o ../data/
We evaluate the performance of each method with four measures: receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC), precision, and recall. These measurements are calculated with our benchmark genes, which consist of 104 true circadian genes and 113 non-circadian genes.
ROC curves and AUC values are computed as the joint measures of specificity and sensitivity to determine the methods’ strengths of classifying circadian and non-circadian transcriptomes. ROC curve with high AUC value suggests the method can effectively identify a large number of true circadian genes while controlling for a high false positive rate.
Precision and recall rates are calculated to characterize the accuracy of each algorithm in terms of detection of circadian transcriptomes. The formulas for calculating \(precision\) and \(recall\) are shown below.
\[precision=\frac{TP}{TP+FP}\] \[recall=\frac{TP}{TP+FN}\]
We use \(p\)-value thresholds of 0.000005, 0.00005, 0.0005, and \(q\)-value threshold of 0.05 to calculate the precision and recall rates for all methods.
Below is a table of the calculated precision and recall rates for each method.Here is a visual representation of the comparisons among the methods.
Below is a list of each method’s pros and cons that we’ve summarized from our benchmark assessment and literature reviews.